On the sampling distribution of resubstitution and leave-one-out error estimators for linear classifiers
نویسندگان
چکیده
Error estimation is a problem of high current interest in many areas of application. This paper concerns the classical problem of determining the performance of error estimators in small-sample settings under a Gaussianity parametric assumption. We provide here for the first time the exact sampling distribution of the resubstitution and leave-one-out error estimators for linear discriminant analysis (LDA) in the univariate case, which is valid for any sample size and combination of parameters (including unequal variances and sample sizes for each class). In the multivariate case case, we provide a quasi-binomial approximation to the distribution of both the resubstitution and leave-one-out error estimators for LDA, under a common but otherwise arbitrary class covariance matrix, which is assumed to be known in the design of the LDA discriminant. We provide numerical examples, using both synthetic and real data, that indicate that these approximations are accurate, provided that LDA classification error is not too large.
منابع مشابه
Exact performance of error estimators for discrete classifiers
Discrete Classification problems abound in pattern recognition and data mining applications. One of the most common discrete rules is the discrete histogram rule. This paper presents exact formulas for the computation of bias, variance, and RMS of the resubstitution and leave-one-out error estimators, for the discrete histogram rule. We also describe an algorithm to compute the exact probabilit...
متن کاملExact representation of the second-order moments for resubstitution and leave-one-out error estimation for linear discriminant analysis in the univariate heteroskedastic Gaussian model
This paper provides exact analytical expressions for the bias, variance, and RMS for the resubstitution and leave-one-out error estimators in the case of linear discriminant analysis (LDA) in the univariate heteroskedastic Gaussian model. Neither the variances nor the sample sizes for the two classes need be the same. The generality of heteroskedasticity (unequal variances) is a fundamental fea...
متن کاملExact Performance of CoD Estimators in Discrete Prediction
The coefficient of determination (CoD) has significant applications in genomics, for example, in the inference of gene regulatory networks. We study several CoD estimators, based upon the resubstitution, leave-one-out, cross-validation, and bootstrap error estimators. We present an exact formulation of performance metrics for the resubstitution and leave-one-out CoD estimators, assuming the dis...
متن کاملJoint Sampling Distribution Between Actual and Estimated Classification Errors for Linear Discriminant Analysis1
Error estimation must be used to find the accuracy of a designed classifier, an issue that is critical in biomarker discovery for disease diagnosis and prognosis in genomics and proteomics. This paper presents, for what is believed to be the first time, the analytical formulation for the joint sampling distribution of the actual and estimated errors of a classification rule. The analysis presen...
متن کاملOptimal convex error estimators for classification
A cross-validation error estimator is obtained by repeatedly leaving out some data points, deriving classifiers on the remaining points, computing errors for these classifiers on the left-out points, and then averaging these errors. The 0.632 bootstrap estimator is obtained by averaging the errors of classifiers designed from points drawn with replacement and then taking a convex combination of...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Pattern Recognition
دوره 42 شماره
صفحات -
تاریخ انتشار 2009